Pearson correlation test (Pearson’s r)#

Goals#

  • Understand what Pearson’s correlation measures (and what it does not measure).

  • Run the hypothesis test for correlation and interpret the result.

  • Implement Pearson’s r and a permutation-based p-value using NumPy only.

  • Build intuition with Plotly visualizations (scatter, null distribution, sampling variability).

1) What is it used for?#

Pearson’s correlation coefficient r answers a very specific question:

How strong is the linear relationship between two numeric variables?

Typical uses:

  • Quick EDA: is there a linear trend between x and y?

  • Feature screening: does a feature move linearly with the target?

  • Model diagnostics: do residuals correlate with a predictor?

Common misuses / misconceptions:

  • r = 0 does not mean “no relationship” (it means no linear relationship).

  • Correlation is not causation.

  • Pearson r can be very sensitive to outliers.

2) The hypothesis test (what are we testing?)#

Let (\rho) be the population correlation between random variables (X) and (Y).

A standard Pearson correlation test (most common form):

  • Null: (H_0: \rho = 0) (no linear association)

  • Alternative:

    • two-sided: (H_1: \rho \neq 0)

    • one-sided: (H_1: \rho > 0) or (H_1: \rho < 0)

What does rejecting (H_0) mean?#

It means your sample provides evidence of a non-zero linear association.

It does not automatically mean:

  • the relationship is strong (effect size can be tiny but “significant” with large n)

  • the relationship is causal

  • the relationship is linear everywhere or stable over time

3) Assumptions (and why a permutation test is useful)#

The classic parametric Pearson test relies on a model where ((X, Y)) is approximately bivariate normal (or at least well-behaved) so that a t-statistic has a known reference distribution.

A permutation test provides an alternative that is often easier to reason about:

  • Under the null of no association, the pairing between x and y is arbitrary.

  • If we shuffle y many times and recompute r, we get a null distribution.

  • The p-value is the fraction of shuffles that produce a correlation as extreme as the observed one.

This permutation approach is:

  • NumPy-only and conceptually direct

  • valid under exchangeability (roughly: the pairs are i.i.d. and shuffling breaks any real association)

import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)

rng = np.random.default_rng(42)

# Optional: SciPy for the classic parametric p-value (not required for the NumPy-only permutation test)
try:
    from scipy import stats
except Exception:
    stats = None
def pearson_r(x, y):
    x = np.asarray(x, dtype=float).ravel()
    y = np.asarray(y, dtype=float).ravel()

    if x.shape != y.shape:
        raise ValueError(f"x and y must have the same shape; got {x.shape} vs {y.shape}")
    if x.size < 2:
        raise ValueError("Need at least 2 paired observations")

    x_c = x - x.mean()
    y_c = y - y.mean()

    ssx = float(np.dot(x_c, x_c))
    ssy = float(np.dot(y_c, y_c))
    if ssx == 0.0 or ssy == 0.0:
        raise ValueError("Correlation is undefined when one input is constant")

    return float(np.dot(x_c, y_c) / np.sqrt(ssx * ssy))


def ols_line(x, y):
    x = np.asarray(x, dtype=float).ravel()
    y = np.asarray(y, dtype=float).ravel()

    x_mean = x.mean()
    y_mean = y.mean()

    x_c = x - x_mean
    denom = float(np.dot(x_c, x_c))
    if denom == 0.0:
        raise ValueError("Cannot fit a line when x is constant")

    b = float(np.dot(x_c, y - y_mean) / denom)
    a = float(y_mean - b * x_mean)
    return a, b


def pearson_t_stat(r, n):
    if n < 3:
        raise ValueError("Need n >= 3 for the t statistic")
    r = float(r)
    if abs(r) >= 1.0:
        return np.inf * np.sign(r)
    return r * np.sqrt((n - 2) / (1 - r**2))


def pearson_p_value_parametric(r, n, alternative="two-sided"):
    if stats is None:
        raise RuntimeError("SciPy is not available; use the permutation test below instead")

    t = pearson_t_stat(r, n)
    df = n - 2

    if alternative == "two-sided":
        return float(2 * stats.t.sf(abs(t), df))
    if alternative == "greater":
        return float(stats.t.sf(t, df))
    if alternative == "less":
        return float(stats.t.cdf(t, df))
    raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")


def pearson_p_value_permutation(x, y, n_perm=10_000, alternative="two-sided", rng=None):
    if rng is None:
        rng = np.random.default_rng()

    x = np.asarray(x, dtype=float).ravel()
    y = np.asarray(y, dtype=float).ravel()
    if x.shape != y.shape:
        raise ValueError("x and y must have the same shape")

    r_obs = pearson_r(x, y)

    r_perm = np.empty(int(n_perm), dtype=float)
    for i in range(r_perm.size):
        r_perm[i] = pearson_r(x, rng.permutation(y))

    if alternative == "two-sided":
        extreme = np.abs(r_perm) >= abs(r_obs)
    elif alternative == "greater":
        extreme = r_perm >= r_obs
    elif alternative == "less":
        extreme = r_perm <= r_obs
    else:
        raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")

    p = float((extreme.sum() + 1) / (r_perm.size + 1))
    return r_obs, p, r_perm


def bootstrap_ci_r(x, y, n_boot=5_000, ci=0.95, rng=None):
    if rng is None:
        rng = np.random.default_rng()

    x = np.asarray(x, dtype=float).ravel()
    y = np.asarray(y, dtype=float).ravel()
    if x.shape != y.shape:
        raise ValueError("x and y must have the same shape")

    n = x.size
    if n < 2:
        raise ValueError("Need at least 2 paired observations")

    idx = rng.integers(0, n, size=(int(n_boot), n))
    xb = x[idx]
    yb = y[idx]

    xb_c = xb - xb.mean(axis=1, keepdims=True)
    yb_c = yb - yb.mean(axis=1, keepdims=True)

    num = np.sum(xb_c * yb_c, axis=1)
    den = np.sqrt(np.sum(xb_c**2, axis=1) * np.sum(yb_c**2, axis=1))
    r_boot = num / den

    alpha = (1 - ci) / 2
    lo, hi = np.quantile(r_boot, [alpha, 1 - alpha])
    return float(lo), float(hi), r_boot

4) Example: compute r and visualize the linear relationship#

We’ll generate a simple synthetic dataset with a positive linear relationship plus noise.

n = 80
x = rng.normal(0, 1, size=n)
y = 0.9 * x + rng.normal(0, 1.2, size=n)

r_obs = pearson_r(x, y)
a, b = ols_line(x, y)

x_grid = np.linspace(x.min(), x.max(), 200)
y_hat = a + b * x_grid

fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode="markers", name="data", marker=dict(size=9, opacity=0.8)))
fig.add_trace(go.Scatter(x=x_grid, y=y_hat, mode="lines", name="OLS fit", line=dict(width=3)))
fig.update_layout(
    title=f"Scatter plot with fitted line (Pearson r = {r_obs:.3f})",
    xaxis_title="x",
    yaxis_title="y",
)
fig.show()

Interpreting r#

  • Sign: r > 0 means y tends to increase when x increases; r < 0 means the opposite.

  • Magnitude: |r| closer to 1 means points lie closer to a straight line.

A useful identity:

[ r = \frac{\operatorname{cov}(x,y)}{\operatorname{sd}(x),\operatorname{sd}(y)} ]

So Pearson r is a standardized covariance:

  • invariant to shifting and scaling of either variable

  • sensitive to outliers (because covariance is)

5) Running the hypothesis test#

We’ll compute a p-value in two ways:

  1. Permutation test (NumPy only) — our low-level, assumption-light implementation.

  2. Parametric t-test (optional, via SciPy) — the classical Pearson correlation test.

Classic parametric Pearson test (t distribution)#

For paired observations \n((x_i, y_i)) and sample size
(n), compute the sample correlation
(r). Under the classical assumptions (especially approximate bivariate normality) and
(H_0:
ho = 0), the test statistic

[ t = r \sqrt{\frac{n-2}{1-r^2}} ]\n has a Student t distribution with
(n-2) degrees of freedom.

  • two-sided:
    (p = 2,P(T_{n-2} \ge |t|))

  • one-sided: use the appropriate tail depending on
    (H_1)

We include this mainly for interpretation and for cross-checking against libraries; the NumPy-only permutation test above does not require this reference distribution.

alpha = 0.05

r_obs, p_perm, r_perm = pearson_p_value_permutation(x, y, n_perm=20_000, alternative="two-sided", rng=rng)
print(f"Observed r = {r_obs:.4f}")
print(f"Permutation p-value (two-sided) = {p_perm:.4g}")

if stats is not None:
    p_param = pearson_p_value_parametric(r_obs, n=x.size, alternative="two-sided")
    t_stat = pearson_t_stat(r_obs, n=x.size)
    print(f"t statistic = {t_stat:.4f} (df={x.size-2})")
    print(f"Parametric p-value (two-sided) = {p_param:.4g}")
else:
    print("SciPy not available: skipping parametric p-value")

print(f"Decision at alpha={alpha}: {'reject H0' if p_perm < alpha else 'fail to reject H0'}")
Observed r = 0.6010
Permutation p-value (two-sided) = 5e-05
t statistic = 6.6416 (df=78)
Parametric p-value (two-sided) = 3.756e-09
Decision at alpha=0.05: reject H0

Visual: permutation null distribution#

Under the null (no association), shuffled pairings should produce correlations near 0. The p-value is the fraction of shuffled correlations that are at least as extreme as the observed one.

thr = abs(r_obs)
mask_extreme = np.abs(r_perm) >= thr

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=r_perm,
        nbinsx=60,
        name="permuted r (null)",
        marker=dict(color="rgba(120,120,120,0.6)"),
        histnorm="probability density",
    )
)
fig.add_trace(
    go.Histogram(
        x=r_perm[mask_extreme],
        nbinsx=60,
        name=f"|r| >= {thr:.3f} (counts toward p)",
        marker=dict(color="rgba(220,0,0,0.65)"),
        histnorm="probability density",
    )
)

fig.add_vline(x=r_obs, line_width=3, line_color="black")
fig.add_vline(x=-r_obs, line_width=3, line_color="black", line_dash="dash")
fig.update_layout(
    barmode="overlay",
    title=f"Permutation null distribution (p ≈ {p_perm:.4g})",
    xaxis_title="Pearson r under H0 (shuffled)",
    yaxis_title="density",
)
fig.show()

What the p-value means (and doesn’t mean)#

A (two-sided) p-value answers:

If (\rho = 0) (no linear association), how often would we see a sample correlation at least as extreme as the one we observed?

It is not:

  • the probability that (H_0) is true

  • the probability that the relationship is “real”

  • a measure of effect size (that is r itself)

A good report includes both:

  • effect size: r

  • uncertainty: CI and/or p-value

6) Confidence intervals (uncertainty on the effect size)#

A p-value tells you about compatibility with (\rho = 0). A confidence interval tells you a range of plausible (\rho) values.

We’ll compute a bootstrap percentile CI for r by resampling the paired observations.

lo, hi, r_boot = bootstrap_ci_r(x, y, n_boot=8_000, ci=0.95, rng=rng)
print(f"Bootstrap 95% CI for r: [{lo:.3f}, {hi:.3f}]")

fig = go.Figure()
fig.add_trace(go.Histogram(x=r_boot, nbinsx=60, histnorm="probability density", name="bootstrap r"))
fig.add_vline(x=r_obs, line_width=3, line_color="black")
fig.add_vline(x=lo, line_width=3, line_color="green")
fig.add_vline(x=hi, line_width=3, line_color="green")
fig.update_layout(
    title="Bootstrap distribution of r with 95% CI",
    xaxis_title="bootstrap Pearson r",
    yaxis_title="density",
)
fig.show()
Bootstrap 95% CI for r: [0.440, 0.725]

7) Pitfall: outliers can dominate Pearson r#

Because Pearson r is based on means, variances, and covariance, a single extreme point can strongly change it.

We’ll compare the same dataset with and without one high-leverage outlier.

x0 = rng.normal(0, 1, size=40)
y0 = 0.5 * x0 + rng.normal(0, 1, size=40)

r0 = pearson_r(x0, y0)

# Add one outlier
x1 = np.append(x0, 8.0)
y1 = np.append(y0, -8.0)

r1 = pearson_r(x1, y1)

fig = make_subplots(rows=1, cols=2, subplot_titles=(f"No outlier (r={r0:.3f})", f"With outlier (r={r1:.3f})"))
fig.add_trace(go.Scatter(x=x0, y=y0, mode="markers", marker=dict(size=9, opacity=0.8), showlegend=False), row=1, col=1)
fig.add_trace(go.Scatter(x=x1, y=y1, mode="markers", marker=dict(size=9, opacity=0.8), showlegend=False), row=1, col=2)
fig.update_xaxes(title_text="x")
fig.update_yaxes(title_text="y")
fig.update_layout(title="A single point can change correlation a lot")
fig.show()

8) Pitfall: a strong relationship can have r ≈ 0 (nonlinear patterns)#

Pearson r measures linear association.

Example: if y = x^2 and x is symmetric around 0, the relationship is strong but not linear, and Pearson r can be close to 0.

x_nl = rng.uniform(-2.5, 2.5, size=200)
y_nl = x_nl**2 + rng.normal(0, 0.2, size=x_nl.size)

r_nl = pearson_r(x_nl, y_nl)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_nl, y=y_nl, mode="markers", marker=dict(size=7, opacity=0.7)))
fig.update_layout(
    title=f"Nonlinear relationship (Pearson r = {r_nl:.3f} can be misleading)",
    xaxis_title="x",
    yaxis_title="y",
)
fig.show()

9) Sampling variability: r depends on n#

Even if the true correlation is (\rho = 0), sample correlations won’t be exactly 0. As n increases, the sampling distribution of r tightens around the true value.

def simulate_r_under_rho(rho, n, n_sims=4000, rng=None):
    if rng is None:
        rng = np.random.default_rng()
    z1 = rng.normal(size=(n_sims, n))
    z2 = rng.normal(size=(n_sims, n))
    x = z1
    y = rho * z1 + np.sqrt(1 - rho**2) * z2

    x_c = x - x.mean(axis=1, keepdims=True)
    y_c = y - y.mean(axis=1, keepdims=True)
    num = np.sum(x_c * y_c, axis=1)
    den = np.sqrt(np.sum(x_c**2, axis=1) * np.sum(y_c**2, axis=1))
    return num / den

rho = 0.0
r_n20 = simulate_r_under_rho(rho, n=20, n_sims=4000, rng=rng)
r_n80 = simulate_r_under_rho(rho, n=80, n_sims=4000, rng=rng)
r_n300 = simulate_r_under_rho(rho, n=300, n_sims=4000, rng=rng)

fig = go.Figure()
fig.add_trace(go.Violin(y=r_n20, name="n=20", box_visible=True, meanline_visible=True))
fig.add_trace(go.Violin(y=r_n80, name="n=80", box_visible=True, meanline_visible=True))
fig.add_trace(go.Violin(y=r_n300, name="n=300", box_visible=True, meanline_visible=True))
fig.update_layout(
    title="Sampling distribution of r when true rho = 0",
    yaxis_title="sample Pearson r",
)
fig.show()

10) How to report and interpret results#

A good report includes:

  • n (sample size)

  • r (effect size)

  • a confidence interval for ( ho) (recommended)

  • p-value and the alternative hypothesis

Example phrasing:

“We found a positive linear association between X and Y (Pearson r = 0.42, 95% bootstrap CI [0.20, 0.60], permutation p = 0.003, n = 80).”

Interpretation checklist#

  • Is the association linear (check a scatter plot)?

  • Are there outliers or clusters driving the correlation?

  • Is the result practically meaningful (not just statistically significant)?

  • Could a confounder explain both variables?

11) Practical usage (SciPy)#

If SciPy is available, scipy.stats.pearsonr computes:

  • Pearson r

  • the classical (parametric) p-value under the t-test reference distribution

We’ll compare it to our NumPy computations.

if stats is None:
    print("SciPy not available")
else:
    r_scipy, p_scipy = stats.pearsonr(x, y)
    print(f"SciPy pearsonr: r={r_scipy:.4f}, p={p_scipy:.4g}")
    print(f"NumPy pearson_r: r={pearson_r(x, y):.4f}")
SciPy pearsonr: r=0.6010, p=3.756e-09
NumPy pearson_r: r=0.6010

Exercises#

  1. Create a dataset where Pearson r is near 0 but there is a strong nonlinear relationship. Plot it.

  2. Simulate correlated data for ( ho\in{0, 0.2, 0.5}) and compare how often you reject (H_0) at alpha=0.05 as n increases.

  3. Add a single outlier to a moderately correlated dataset and measure how much r changes.

  4. Implement Spearman correlation (rank-based) using NumPy and compare it on the nonlinear example.

References#

  • Student’s t relationship for Pearson correlation: standard results in most mathematical statistics texts.

  • Permutation tests / randomization tests: Fisher (1935), Good (2005), and many modern applied stats resources.

  • SciPy documentation: scipy.stats.pearsonr.